Introduction to Software Security
What is computer security?
- Systems may fail for many reasons
- Reliability deals with accidental failures
- Usability deals with problems arising from operating mistakes by users
- Security deals with intentional failures created by intelligent parties
- Computing in the presence of an adversary
Examples of Software Problems
- Therac-25 medical accelerator
- Killed 5
- Mars Climate Orbiter
- Destroyed, units mismatch
- AT&T long distance network
- Switches crashed when they received certain message
- iPhone bug (2015)
- Text certain characters will crash phone
Adversarial Failures
- Bugs are bad
- Much worse when someone intentionally tries to exploit bugs
- Force code into worst state
- Violate security system
- Common class of bugs: buffer overflow
- Buffer overflow in Berkeley Unix finger daemon
- Exploited by Morris Worm
Morris Worm (1988)
- Vulnerability exploited – sendmail, finger and rsh/rexec and weak passwords
- Worm infect 6000 computers
- First significant worm
Software Vulnerabilities are Everywhere
- HearBleed (OpenSSL)
- WannaCry (Microsoft)
- Cloudbleed
What is Software Security
- System Model – used by several users simultaneously
- Threat Model = adversary interacts with API provided by software
- Properties = confidentiality, integrity, availability
- Pick two??
Why software vulnerabilities matter?
- Software bugs are bad
- Attacker successfully exploits vulnerability can lead to
- Crash – no availability
- Execute arbitrary code – no integrity
- Obtain sensitive information – no confidentiality
Common Vulnerabilities
- Buffer overflow
- Integer overflow
- Format string
- Input validation
- Race condiations
Buffer Overflows
What is Buffer Overflow
- Anomalous condition where a process attempts to store data beyond the boundaries of fixed-length buffer
- This may overwrite adjacent memory locations with crucial data, may result in program crash, incorrect results or security and privacy leaks
- Most common problem in C/C++ programs
Example code
- Look at slides
How does memory work
- The memory of a process is divided into three regions
- Text: executable code from program
- Heap: Dynamically allocated data
- Stack: Local variables, function return addresses, stack pointer (grows/shrinks)
How the memory stack works
Anything wrong with this program?
*strcpy doesnt handle bounds check!
If arg is larger than 16 bytes?
The attacker can:
- Crash the program
- Can execute arbitrary code
- Call libc functions
Buffer Overflow Summary
- Requires an unsafe function
- strcpy/strcat/strcmp
- gets
- printf/scanf
- Memcpy
- Buffer must contain address of attack code in return position
- Attacker must know wher buffer will be when function is called
Software Security – Other Vulnerabilities
Introduction
Stack-based Exploits
- Stack Smashing – like an axe
- Return to libc – steak knife
- Format string – scalpel
Bad coding Practices
- Unsafe libc functions
- strcpy/strcat/strcmp
- gets
- printf/scanf
- memcpy
- What can we do?
- strncpy/strncat/strncmp
- fgets
- Two steps fo finding trivial vulnerabilities
- Look for unsafe functions
- Trace attacker-controlled input to these functions
Integer Overflow
Three main types of Integer Overflow
- Assign a large type to a small type
- Arithmetic overflow
- Signedness bug
Large Type / Small Type
int i = 0xAABBCCDD;
short s = i;
char c = i;
// Type casting problem
s is truncated
c is just the first character
struct s {
unsigned short len;
char buf[];
}
int len = strlen(str);
struct s *p = malloc(len+3);
p->len = len;
strcpy(p->buf, str);
char buf[1000];
if(p->len < sizeof buf)
strcpy(buf, p->buf);
Arithmetic Overflow
- The result of an arithmetic operation is too large for a variable
- Example
unsigned int a = 0xffffff;
b = 1;
c = a + b;
// Here c will be 0 because of overflow
Signedness bug
- Compare two signed integers
- Compare signed and unsigned integers
- Treating a signed negative number as unsigned
- Also careful with type casting
Return to LIBC
Why provide our own shellcode
- OSes already have lots of libraries
- Idea: point to ibc instead of back into the stack
- system(); exec*()
- Modify “arguments” in addition to return address
- Depending on situation can “chain” calls
- setuid(); system(…)..
Above example:
- Instead of returning to program code, it calls a function in libc and passes the shell as argument
- The system function looks in the system address and treats it as normal operation thinking the argument in the next location is it’s return address. This can be chained to other calls and returns.
Format String
Exploit of printf family of functions
- Printf, fprintf, sprintf, snprintf, vprintf, fvprintf
- Each one takes a format string
- %c, %d, %i, %u, %x, %s
- %n =
Ways printf can be exploited:
- Reading the memory stack
- Reading arbitrary memory
- Overwriting memory
Examples
printf(“%5d\n”, 37);
Int p
printf(“hi %d%n”, 24, &p); // hi 24 – then stores this into p
This can be exploited as such:
char * evil; // evil = “%08x %08x %08x %08x”
printf(evil); // printf will treat these a format string
// this means the printf will treat subsequent memory stacks as arguments, which could be printed
How can it be fixed?
- Use the fixed-format string
- printf(“%s”, user_data);
- Dont use %n !
Software Security Defenses
What else we can do?
- Use safe library functions (e.g., strncpy, strncat)
- Use a “safe” language
- Lots of testing
- Static code analysis
- NX/DEP/W^X
- Runtime checks (stack canary)
- Do our own bounds checking
- Shadow stack
- Address obfuscation
Static Source Code Analysis
- Statically check source code to detect buffer overflows (and other vulnerabilities)
- Main idea: automate the code review process
- Find lots of bugs
Possible bugs in source code
- Crash Causing Defects
- Null pointer dereference
- Use after free
- Double free
- Array indexing errors
- Mismatched array new/delete
- Potential stack overrun
- Potential heap overrun
- Return pointers to local variables
- Logically inconsistent code
- Uninitialized variables
- Invalid use of negative values
- Passing large parameters by value
- Underallocations of dynamic data
- Memory leaks
- File handle leaks
- Network resource leaks
- Unused values
- Unhandled return codes
- Use of invalid iterators
Do our own bounds checking
char buf[80];
void function() {
int len = read_int_in();
char *p = read_string_in();
if(len > sizeof buf) {
error(“length too large!”);
return;
}
memcpy(buf, p, len);
}
User input needs validation
- Many sources of input for local applications:
- Command line arguments
- Environment variables
- Configuration files and other files
- Network packets
- Other user inputs that need validation:
- Web form input
NX/DEP/W^X
No-eXecute bit
- Can mark certain areas of memory as non-executable
Data Execution Prevention/Write XOR Execute
- Separation of code and data
- Force hardware-level exceptions if you try to execute those memory regions
Prevents basic stack-based exploits
Problems:
- Does not defend against return-to-libc (and other) attacks
- Can break backward compatibility with certain applications
Stack Canary
- Canary (bird) used to check if environment is safe or not safe
- Used to detect a buffer overflow before execution of malicious code can occur
- Place a random value before the return address
- Check this random value before returning
- Reactionary not preventative
We smash a canary and check its value
Shadow Stack
- Keep an extra copy of return address in Kernel Memory
- Only return if addresses match up
- Doesn’t protect other memory, registers, etc.
Address Obfuscation
- Randomize address space
- Introduce artificial diversity
- Place stack, buffers at random location
- Thus, attackers won’t know precise address to point control flow
- E.g., PaX ASLR, Windows Vista and later, etc.
PAX ASLR (Address Space Layout Randomization)
PAX ASLR (Address Space Layout Randomization) is a part of the PaX patches for Linux, which are a set of security enhancements to the Linux kernel. ASLR is a security feature that helps to prevent certain types of exploits, such as buffer overflows and return-to-libc attacks, by randomly arranging the positions of key data areas of a process, including the base of the executable and the positions of the stack, heap, and libraries.
Limitations of ASLR
- Several limitations:
- 16 bits not a huge number
- Doesn’t re-randomize on fork()
- So just try over and over again until you guess randomization
- 64-bit addressing helps a lot here
- Sometimes libraries (e.g., DLLs) aren’t randomized
Solutions
- Use a safe language…
- Java, Ruby, etc.
- Enforce bounds checking, garbage collection
- Type safety
- Don’t ever let programmers near the memory!
- We can even run untrusted code in a sandbox
- Other issues:
- Vulnerabilities in JVM
- Thread issues
- Load malicious libraries
- Efficiency?
- Interpreted languages aren’t always our friend
- The new frontier is finding bugs in VMs
- If you can run arbitrary “safe” code, then it gets lots of chances to work its way out